June 2016 - This notebook was created by [Jordi Vitrià](http://www.ub.edu/cvub/jordivitria/). Source and [license](./LICENSE.txt) info are in the folder.


In [3]:
import warnings
warnings.filterwarnings('ignore')

import numpy as np
import seaborn as sns
import bokeh.plotting as bp
import matplotlib.pyplot as plt
from sklearn.datasets.samples_generator import make_regression 
from scipy import stats 
from bokeh.models import  WheelZoomTool, ResetTool, PanTool
%matplotlib inline

Basic Concepts I

What is "learning from data"?

In general Learning from Data is a scientific discipline that is concerned with the design and development of algorithms that allow computers to infer (from data) a model that allows compact representation (unsupervised learning) and/or good generalization (supervised learning).

This is an important technology because it enables computational systems to adaptively improve their performance with experience accumulated from the observed data.

Most of these algorithms are based on the iterative solution of a mathematical problem that involves data and model. If there was an analytical solution to the problem, this should be the adopted one, but this is not the case for most of the cases.

So, the most common strategy for learning from data is based on solving a system of equations as a way to find a series of parameters of the model that minimizes a mathematical problem. This is called optimization.

The most important technique for solving optimization problems is gradient descend.

Preliminary: Nelder-Mead method for function minimization.

See "An Interactive Tutorial on Numerical Optimization": http://www.benfrederickson.com/numerical-optimization/

The most simple thing we can try to minimize a function $f(x)$ would be to sample two points relatively near each other, and just repeatedly take a step down away from the largest value.

The Nelder-Mead method dynamically adjusts the step size based off the loss of the new point. If the new point is better than any previously seen value, it expands the step size to accelerate towards the bottom. Likewise if the new point is worse it contracts the step size to converge around the minima. The usual settings are to half the step size when contracting and double the step size when expanding.

This method can be easily extended into higher dimensional examples, all thats required is taking one more point than there are dimensions - and then reflecting the worst point around the rest of the points to take a step down.

Gradient descend (for hackers) for function minimization: 1-D

Let's suppose that we have a function $f: \Re \rightarrow \Re$. For example:

$$f(x) = x^2$$

Our objective is to find the argument $x$ that minimizes this function (for maximization, consider $-f(x)$). To this end, the critical concept is the derivative.

The derivative of $f$ of a variable $x$, $f'(x)$ or $\frac{\mathrm{d}f}{\mathrm{d}x}$, is a measure of the rate at which the value of the function changes with respect to the change of the variable. It is defined as the following limit:

$$ f'(x) = \lim_{h \rightarrow 0} \frac{f(x + h) - f(x)}{h} $$

The derivative specifies how to scale a small change in the input in order to obtain the corresponding change in the output:

$$ f(x + h) \approx f(x) + h f'(x)$$

In [4]:
# numerical derivative at a point x

def f(x):
    return x**2

def fin_dif(x, f, h = 0.00001):
    '''
    This method returns the derivative of f at x
    by using the finite difference method
    '''
    return (f(x+h) - f(x))/h

x = 2.0
print "{:2.4f}".format(fin_dif(x,f))


4.0000

The limit as $h$ approaches zero, if it exists, should represent the slope of the tangent line to $(x, f(x))$.

For values that are not zero it is only an approximation.


In [5]:
for h in np.linspace(0.0, 1.0 , 5):
    print "{:3.6f}".format(f(5+h)), "{:3.6f}".format(f(5)+h*fin_dif(5,f))


25.000000 25.000000
27.562500 27.500002
30.250000 30.000005
33.062500 32.500007
36.000000 35.000010

In [6]:
x = np.linspace(-1.5,-0.5, 100)
f = [i**2 for i in x]
plt.plot(x,f, 'r-')
plt.plot([-1.5, -0.5], [2, 0.0], 'k-', lw=2)
plt.plot([-1.4, -1.0], [1.96, 1.0], 'b-', lw=2)
plt.plot([-1],[1],'o')
plt.plot([-1.4],[1.96],'o')
plt.text(-1.0, 1.2, r'$x,f(x)$')
plt.text(-1.4, 2.2, r'$(x-h),f(x-h)$')
plt.gcf().set_size_inches((12,6))
plt.grid()
plt.show


Out[6]:
<function matplotlib.pyplot.show>

It can be shown that the “centered difference formula" is better when computing numerical derivatives:

$$ \lim_{h \rightarrow 0} \frac{f(x + h) - f(x - h)}{2h} $$

The error in the "finite difference" approximation can be derived from Taylor's theorem and, assuming that $f$ is differentiable, is $O(h)$. In the case of “centered difference" the error is $O(h^2)$.

The derivative tells how to chage $x$ in order to make a small improvement in $f$.

Then, we can follow these steps to decrease the value of the function:

  • Start from a random $x$ value.
  • Compute the derivative $f'(x) = \lim_{h \rightarrow 0} \frac{f(x + h) - f(x - h)}{2h}$.
  • Walk a small step in the opposite direction of the derivative, because we know that $f(x - h \mbox{ sign}(f'(x))$ is less than $f(x)$ for small enough $h$.

The search for the minima ends when the derivative is zero because we have no more information about which direction to move. $x$ is a critical o stationary point if $f'(x)=0$.

  • A minimum (maximum) is a critical point where $f(x)$ is lower (higher) than at all neighboring points.
  • There is a third class of critical points: saddle points.

If $f$ is a convex function, this should be the minimum (maximum) of our functions. In other cases it could be a local minimum (maximum) or a saddle point.


In [10]:
W = 400
H = 250
bp.output_notebook()


Loading BokehJS ...

In [11]:
x = np.linspace(-15,15,100)
y = x**2

TOOLS = [WheelZoomTool(), ResetTool(), PanTool()]

s1 = bp.figure(width=W, plot_height=H, 
               title='Local minimum of function',  
               tools=TOOLS)
s1.line(x, y, color="navy", alpha=0.5, line_width=3)
s1.circle(0, 0, size =10, color="orange")
s1.title_text_font_size = '12pt'
s1.yaxis.axis_label_text_font_size = "14pt"
s1.xaxis.axis_label_text_font_size = "14pt"

bp.show(s1)



In [12]:
x = np.linspace(-15,15,100)
y = -x**2

TOOLS = [WheelZoomTool(), ResetTool(), PanTool()]


s1 = bp.figure(width=W, plot_height=H, 
               title='Local maximum of function',  
               tools=TOOLS)
s1.line(x, y, color="navy", alpha=0.5, line_width=3)
s1.circle(0, 0, size =10, color="orange")
s1.title_text_font_size = '12pt'
s1.yaxis.axis_label_text_font_size = "14pt"
s1.xaxis.axis_label_text_font_size = "14pt"

bp.show(s1)



In [13]:
x = np.linspace(-15,15,100)
y = x**3

TOOLS = [WheelZoomTool(), ResetTool(), PanTool()]


s1 = bp.figure(width=W, plot_height=H, 
               title='Saddle point of function',  
               tools=TOOLS)
s1.line(x, y, color="navy", alpha=0.5, line_width=3)
s1.circle(0, 0, size =10, color="orange")
s1.title_text_font_size = '12pt'
s1.yaxis.axis_label_text_font_size = "14pt"
s1.xaxis.axis_label_text_font_size = "14pt"

bp.show(s1)


There are two problems with numerical derivatives:

  • It is approximate.
  • It is very slow to evaluate (two function evaluations: $f(x + h) , f(x - h)$ ).

Our knowledge from Calculus could help!

We know that we can get an analytical expression of the derivative for some functions.

For example, let's suppose we have a simple quadratic function, $f(x)=x^2−6x+5$, and we want to find the minimum of this function.

First approach

We can solve this analytically using Calculus, by finding the derivate $f'(x) = 2x-6$ and setting it to zero:

\begin{equation} \begin{split} 2x-6 & = & 0 \\ 2x & = & 6 \\ x & = & 3 \\ \end{split} \end{equation}


In [14]:
x = np.linspace(-10,20,100)
y = x**2 - 6*x + 5
 
TOOLS = [WheelZoomTool(), ResetTool(), PanTool()]


s1 = bp.figure(width=W, plot_height=H, 
               tools=TOOLS)
s1.line(x, y, color="navy", alpha=0.5, line_width=3)
s1.circle(3, 3**2 - 6*3 + 5, size =10, color="orange")
s1.title_text_font_size = '12pt'
s1.yaxis.axis_label_text_font_size = "14pt"
s1.xaxis.axis_label_text_font_size = "14pt"

bp.show(s1)


Second approach

To find the local minimum using gradient descend: you start at a random point, and move into the direction of steepest descent relative to the derivative:

  • Start from a random $x$ value.
  • Compute the derivative $f'(x)$ analitically.
  • Walk a small step in the opposite direction of the derivative.

In this example, let's suppose we start at $x=15$. The derivative at this point is $2×15−6=24$.

Because we're using gradient descent, we need to subtract the gradient from our $x$-coordinate: $f(x - f'(x))$. However, notice that $15−24$ gives us $−9$, clearly overshooting over target of $3$.


In [15]:
x = np.linspace(-10,20,100)
y = x**2 - 6*x + 5
start = 15

TOOLS = [WheelZoomTool(), ResetTool(), PanTool()]


s1 = bp.figure(width=W, plot_height=H, 
               tools=TOOLS)
s1.line(x, y, color="navy", alpha=0.5, line_width=3)
s1.circle(start, start**2 - 6*start + 5, size =10, color="orange")

d = 2 * start - 6
end = start - d

s1.circle(end, end**2 - 6*end + 5, size =10, color="red")
s1.title_text_font_size = '12pt'
s1.yaxis.axis_label_text_font_size = "14pt"
s1.xaxis.axis_label_text_font_size = "14pt"

bp.show(s1)


To fix this, we multiply the gradient by a step size. This step size (often called alpha) has to be chosen carefully, as a value too small will result in a long computation time, while a value too large will not give you the right result (by overshooting) or even fail to converge.

In this example, we'll set the step size to 0.01, which means we'll subtract $24×0.01$ from $15$, which is $14.76$.

This is now our new temporary local minimum: We continue this method until we either don't see a change after we subtracted the derivative step size, or until we've completed a pre-set number of iterations.


In [16]:
old_min = 0
temp_min = 15
step_size = 0.01
precision = 0.0001
 
def f_derivative(x):
    import math
    return 2*x -6

mins = []
cost = []

while abs(temp_min - old_min) > precision:
    old_min = temp_min 
    gradient = f_derivative(old_min) 
    move = gradient * step_size
    temp_min = old_min - move
    cost.append((3-temp_min)**2)
    mins.append(temp_min)

# rounding the result to 2 digits because of the step size
print "Local minimum occurs at {:3.2f}.".format(round(temp_min,2))


Local minimum occurs at 3.00.

An important feature of gradient descent is that there should be a visible improvement over time: In this example, we simply plotted the squared distance from the local minima calculated by gradient descent and the true local minimum, cost, against the iteration during which it was calculated. As we can see, the distance gets smaller over time, but barely changes in later iterations.


In [17]:
TOOLS = [WheelZoomTool(), ResetTool(), PanTool()]


x, y = (zip(*enumerate(cost)))
s1 = bp.figure(width=W, 
               height=H, 
               title='Squared distance to true local minimum',  
#                title_text_font_size='14pt', 
               tools=TOOLS,
               x_axis_label = 'Iteration',
               y_axis_label = 'Distance'
)
s1.line(x, y, color="navy", alpha=0.5, line_width=3)
s1.title_text_font_size = '16pt'
s1.yaxis.axis_label_text_font_size = "14pt"
s1.xaxis.axis_label_text_font_size = "14pt"


bp.show(s1)


From derivatives to gradient: $n$-dimensional function minimization.

Let's consider a $n$-dimensional function $f: \Re^n \rightarrow \Re$. For example:

$$f(\mathbf{x}) = \sum_{n} x_n^2$$

Our objective is to find the argument $\mathbf{x}$ that minimizes this function.

The gradient of $f$ is the vector whose components are the $n$ partial derivatives of $f$. It is thus a vector-valued function.

The gradient points in the direction of the greatest rate of increase of the function.

$$\nabla {f} = (\frac{\partial f}{\partial x_1}, \dots, \frac{\partial f}{\partial x_n})$$

In [16]:
def f(x):
    return sum(x_i**2 for x_i in x)

def fin_dif_partial_centered(x, f, i, h=1e-6):
    w1 = [x_j + (h if j==i else 0) for j, x_j in enumerate(x)]
    w2 = [x_j - (h if j==i else 0) for j, x_j in enumerate(x)]
    return (f(w1) - f(w2))/(2*h)

def fin_dif_partial_old(x, f, i, h=1e-6):
    w1 = [x_j + (h if j==i else 0) for j, x_j in enumerate(x)]
    return (f(w1) - f(x))/h

def gradient_centered(x, f, h=1e-6):
    return[round(fin_dif_partial_centered(x,f,i,h), 10) for i,_ in enumerate(x)]

def gradient_old(x, f, h=1e-6):
    return[round(fin_dif_partial_old(x,f,i,h), 10) for i,_ in enumerate(x)]

x = [1.0,1.0,1.0]

print f(x), gradient_centered(x,f)
print f(x), gradient_old(x,f)


3.0 [2.0000000001, 2.0000000001, 2.0000000001]
3.0 [2.0000009999, 2.0000009999, 2.0000009999]

The function we have evaluated, $f({\mathbf x}) = x_1^2+x_2^2+x_3^2$, is $3$ at $(1,1,1)$ and the gradient vector at this point is $(2,2,2)$.

Then, we can follow this steps to maximize (or minimize) the function:

  • Start from a random $\mathbf{x}$ vector.
  • Compute the gradient vector.
  • Walk a small step in the opposite direction of the gradient vector.

It is important to be aware that this gradient computation is very expensive: if $\mathbf{x}$ has dimension $n$, we have to evaluate $f$ at $2*n$ points.

How to use the gradient.

$f(x) = \sum_i x_i^2$, takes its mimimum value when all $x$ are 0.

Let's check it for $n=3$:


In [18]:
def euc_dist(v1,v2):
    import numpy as np
    import math
    v = np.array(v1)-np.array(v2)
    return math.sqrt(sum(v_i ** 2 for v_i in v))

Let's start by choosing a random vector and then walking a step in the opposite direction of the gradient vector. We will stop when the difference between the new solution and the old solution is less than a tolerance value.


In [19]:
# choosing a random vector

import random
import numpy as np

x = [random.randint(-10,10) for i in range(3)]
x


Out[19]:
[-2, 5, 4]

In [19]:
def step(x,grad,alpha):
    return [x_i - alpha * grad_i for x_i, grad_i in zip(x,grad)]

tol = 1e-15
alpha = 0.01
while True:
    grad = gradient_centered(x,f)
    next_x = step(x,grad,alpha)
    if euc_dist(next_x,x) < tol:
        break
    x = next_x
print [round(i,10) for i in x]


[-0.0, 0.0, 0.0]

Alpha

The step size, alpha, is a slippy concept: if it is too small we will slowly converge to the solution, if it is too large we can diverge from the solution.

There are several policies to follow when selecting the step size:

  • Constant size steps. In this case, the size step determines the precision of the solution.
  • Decreasing step sizes.
  • At each step, select the optimal step.

The last policy is good, but too expensive. In this case we would consider a fixed set of values:


In [17]:
step_size = [100, 10, 1, 0.1, 0.01, 0.001, 0.0001, 0.00001]

Learning from data

In general, we have:

  • A dataset $(\mathbf{x},y)$.
  • A target function $f_\mathbf{w}$, that we want to minimize, representing the discrepancy between our data and the model we want to fit. The model is represented by a set of parameters $\mathbf{w}$.
  • The gradient of the target function, $g_f$.

In the most common case $f$ represents the errors from a data representation model $M$. To fit the model is to find the optimal parameters $\mathbf{w}$ that minimize the following expression:

$$ f_\mathbf{w} = \sum_{i} (y_i - M(\mathbf{x}_i,\mathbf{w}))^2 $$

For example, $(\mathbf{x},y)$ can represent:

  • $\mathbf{x}$: the behavior of a "Candy Crush" player; $y$: monthly payments.
  • $\mathbf{x}$: sensor data about your car engine; $y$: probability of engine error.
  • $\mathbf{x}$: finantial data of a bank customer; $y$: customer rating.

If $y$ is a real value, it is called a regression problem.

If $y$ is binary/categorical, it is called a classification problem.

Let's suppose that $M(\mathbf{x},\mathbf{w}) = \mathbf{w} \cdot \mathbf{x}$.

Batch gradient descend

We can implement gradient descend in the following way (batch gradient descend):


In [21]:
# f = 2x
x = range(100)
y = [2*i for i in x]

# f_target = Sum (y - wx)**2
def target_f(x,y,w):
    import numpy as np
    return np.sum((np.array(y) - np.array(x) * w)**2.0)

# gradient_f = Sum 2wx**2 - 2xy
def gradient_f(x,y,w):
    import numpy as np
    return np.sum(2*w*(np.array(x)**2) - 2*np.array(x)*np.array(y))

def step(w,grad,alpha):
    return w - alpha * grad

def min_batch(target_f, gradient_f, x, y, toler = 1e-6):
    import random
    alphas = [100, 10, 1, 0.1, 0.001, 0.00001]
    w = random.random()
    val = target_f(x,y,w)
    print "First w:", w, "First Val:", val, "\n"
    i = 0
    while True:
        i += 1
        gradient = gradient_f(x,y,w)
        next_ws = [step(w, gradient, alpha) for alpha in alphas]
        next_vals = [target_f(x,y,w) for w in next_ws]
        min_val = min(next_vals)
        next_w = next_ws[next_vals.index(min_val)]   
        next_val = target_f(x,y,next_w)
        print i, "w: {:4.4f}".format(w), "Val:{:4.4f}".format(val), "Gradient:", gradient        
        if (abs(val - next_val) < toler) or (i>200):
            return w
        else:
            w, val = next_w, next_val
            
min_batch(target_f, gradient_f, x, y)


First w: 0.537744150296 First Val: 702075.399017 

1 w: 10.1404 Val:702075.3990 Gradient: -960263.416501
2 w: -43.3175 Val:21758362.0058 Gradient: 5345786.43966
3 w: 254.2824 Val:674324036.7612 Gradient: -29759993.1096
4 w: -1402.4564 Val:20898305967.7201 Gradient: 165673881.641
5 w: 7820.6086 Val:647669619517.2316 Gradient: -922306499.096
6 w: -43524.1942 Val:20072245888902.4648 Gradient: 5134480280.47
7 w: 242312.3230 Val:622068787671216.2500 Gradient: -28583651721.3
8 w: -1348939.5683 Val:19278837990355840.0000 Gradient: 159125189133.0
9 w: 7509559.7107 Val:597479895510895232.0000 Gradient: -885849927902.0
10 w: -41805705.7756 Val:18516791609447063552.0000 Gradient: 4.93152654863e+12
11 w: 232732377.1867 Val:573862943479471079424.0000 Gradient: -2.74538082962e+13
12 w: -1295621130.6642 Val:17784867100351661080576.0000 Gradient: 1.52835350785e+14
13 w: 7212722847.5418 Val:551179513107010602139648.0000 Gradient: -8.50834397821e+14
14 w: -40153228079.1312 Val:17081873817481277608034304.0000 Gradient: 4.73659509267e+15
15 w: 223533020729.6573 Val:529392704513859426801680384.0000 Gradient: -2.63686248809e+16
16 w: -1244408326388.8684 Val:16406668178621525442665906176.0000 Gradient: 1.46794134712e+17
17 w: 6927621153019.9668 Val:508467076384399287019479498752.0000 Gradient: -8.17202947941e+17
18 w: -38566066958849.0234 Val:15758151804629279453987932733440.0000 Gradient: 4.54936881119e+18
19 w: 214697294759925.6562 Val:488368588313498552411898467319808.0000 Gradient: -2.53263361719e+19
20 w: -1195219839928493.0000 Val:15135269732663318659071238247284736.0000 Gradient: 1.40991713469e+20
21 w: 6653788848881935.0000 Val:469064545431868147333998999693688832.0000 Gradient: -7.84900868881e+20
22 w: -37041642521725720.0000 Val:14537008700041751455672985178320404480.0000 Gradient: 4.36954313706e+21
23 w: 206210823918447040.0000 Val:450523545220248115912105340142023081984.0000 Gradient: -2.4325246644e+22
24 w: -1147975656753994496.0000 Val:13962395495934315833853726808996872978432.0000 Gradient: 1.35418648067e+23
25 w: 6390780481149489152.0000 Val:432715426425897876149321038984062890409984.0000 Gradient: -7.5387561379e+23
26 w: -35577474938559205376.0000 Val:13410495378208531640627431947466182905298944.0000 Gradient: 4.19682554197e+24
27 w: 198059802982959153152.0000 Val:415611219998300446792713742097166292834320384.0000 Gradient: -2.33637277922e+25
28 w: -1102598923206133678080.0000 Val:12880410552853914352961217580375380193928282112.0000 Gradient: 1.30065872619e+26
29 w: 6138168205488546643968.0000 Val:399183101964256106651686112559682512764197142528.0000 Gradient: -7.24076712869e+26
30 w: -34171182399954737954816.0000 Val:12371278713511122050439576787785725588337812570112.0000 Gradient: 4.03093506054e+27
31 w: 190230972420548059987968.0000 Val:383404348165714053705308891791603545306979859693568.0000 Gradient: -2.24402154821e+28
32 w: -1059015823465191114276864.0000 Val:11882271638729903212789744637365994431105136980918272.0000 Gradient: 1.24924679589e+29
33 w: 5895541089230718165319680.0000 Val:368249290786709828394957205609238130330071642564198400.0000 Gradient: -6.9545569127e+29
34 w: -32820477243747407956541440.0000 Val:11412593844674114042320646538958648569660935508569620480.0000 Gradient: 3.8716018333e+30
35 w: 182711596815941843158040576.0000 Val:353693276598685574796994812618466860314346788985350651904.0000 Gradient: -2.1553207406e+31
36 w: -1017155459474348353595113472.0000 Val:10961481291082122781032637457906487780365530982543779692544.0000 Gradient: 1.19986705629e+32
37 w: 5662504442893697979630223360.0000 Val:339712626856277485785310924822348683426683296354128667607040.0000 Gradient: -6.67965990237e+32
38 w: -31523162233589217757610639360.0000 Val:10528200138377430540702739634933474891306617271525467163197440.0000 Gradient: 3.71856666765e+33
39 w: 175489444154391207560270053376.0000 Val:326284598778322680254719033947144769206555100315396398668316672.0000 Gradient: -2.07012606388e+34
40 w: -976949735607495760797549723648.0000 Val:10112045553907802931932953846292251802627246642979382343413792768.0000 Gradient: 1.15243917976e+35
41 w: 5438679178126929333549948469248.0000 Val:313387348551432601004385139576896034868197203027631614634068279296.0000 Gradient: -6.41562891373e+35
42 w: -30277126984632617419326812585984.0000 Val:9712340565370889114305421825807851562122155052713756596061681483776.0000 Gradient: 3.57158061628e+36
43 w: 168552765923449787847727013429248.0000 Val:300999895795945782342680898554596202474387965183897664738750856429568.0000 Gradient: -1.98829892908e+37
44 w: -938333247895845144156336186982400.0000 Val:9328434959561199414153925300146659607567879309656088092976379601092608.0000 Gradient: 1.10688601382e+38
45 w: 5223701191036169967958639379480576.0000 Val:289102089436456389794156905377798320071673590813857791511111142465339392.0000 Gradient: -6.16203443893e+38
46 w: -29080344530498361506675405591937024.0000 Val:8959704224646958723990552591534428249385285668718865325340143324095315968.0000 Gradient: 3.43040457215e+39
47 w: 161890278001284387952394948669603840.0000 Val:277674574921399830193873945474887119867809927810385781146920243152065396736.0000 Gradient: -1.90970622532e+40
48 w: -901243177633150324605823706168557568.0000 Val:8605548534256235804561321008150079454672156072721730298011792602551168270336.0000 Gradient: 1.06313345563e+41
49 w: 5017220769883749347282393822792777728.0000 Val:266698762738368384719340895471454451244060821726614692504141369552383743361024.0000 Gradient: -5.91846394752e+41
50 w: -27930868025942838313856247993714343936.0000 Val:8265391771719757045919710385942636302160476666763585897178840881680979354189824.0000 Gradient: 3.29480887958e+42
51 w: 155491142300423832027022009093526126592.0000 Val:256156798173943561875002090157495389238582328247983202988627461379812071766491136.0000 Gradient: -1.83422010326e+43
52 w: -865619189186459688782137655492360536064.0000 Val:7938680592882996321452431810990299064521927571480391882332031235453331270602326016.0000 Gradient: 1.02111033149e+44
53 w: 4818902026201022488444069534029355941888.0000 Val:246031532268847031636148622676533792619133494272593449856972289020250584933268979712.0000 Gradient: -5.68452121539e+44
54 w: -26826827579861093165144494066103280992256.0000 Val:7624883525963121569972418861190897376415362494506483572189162212349189492016951590912.0000 Gradient: 3.16457296061e+45
55 w: 149344949137086745482047303128799013502976.0000 Val:236306493921167310932907649839565787478123278540595887599075036805610219575228271427584.0000 Gradient: -1.76171776717e+46
56 w: -831403331846161951035640134666000567107584.0000 Val:7323490106986425624216852215756220309720157475816236193238565286298555380641268536180736.0000 Gradient: 9.80748280983e+46
57 w: 4628422348387582931342145499950308300685312.0000 Val:226965863092278657931834416096377626364869584395306655362931569756869477510133239651500032.0000 Gradient: -5.45982568023e+47
58 w: -25766427213473680203216925180526474106503168.0000 Val:7034010049399858936451778871301979446430129711602074138963528716222589015811678636980830208.0000 Gradient: 3.03948495619e+48
59 w: 143441700297407988035535590748628458407985152.0000 Val:217994445071865274206529428511529048396246783173890897787237575623311276386594647545024086016.0000 Gradient: -1.69208127511e+49
60 w: -798539945555670387721087348751402823915143168.0000 Val:6755972446505818194185215610419070199506790034004383780173659143971010902860725260509566205952.0000 Gradient: 9.41981645853e+49
61 w: 4445471876908416743731780240638417135965569024.0000 Val:209377645760188188300350136364184609654646365649388253087949551805255253622074609324992711622656.0000 Gradient: -5.24401182246e+50
62 w: -24747941938749154607797955546755891337573171200.0000 Val:6488925005422768325380689077533916127553046372415050352854368387380550331495787208322459499495424.0000 Gradient: 2.91934138157e+51
63 w: 137771792773016550166629279692759994709456191488.0000 Val:201101447927384674207744803354535702158857549107614899044887745285733979561602503739618520995987456.0000 Gradient: -1.62519734712e+52
64 w: -766975570367383193474918593017529097450887839744.0000 Val:6232433311325615356787629509753591996158949059967472722395803970627977286178462499308352364796182528.0000 Gradient: 9.0474736314e+52
65 w: 4269753000235224186646527249351865454969418153984.0000 Val:193152388411181413324827220274715960599866120197836921261832334686623177939581967566131709941114732544.0000 Gradient: -5.0367285706e+53
66 w: -23769714952309492807566524597222910621845607677952.0000 Val:5986080120768859547223117525507045080474001929641755840249873869611185183789705221364432163318643294208.0000 Gradient: 2.80394679525e+54
67 w: 132326003139506941116849375000402313673933775175680.0000 Val:185517536215926809289953510844833193642340735404638255450046466836388667132715985702806223881746157404160.0000 Gradient: -1.56095718092e+55
68 w: -736658859477635038078484860125562977607132228091904.0000 Val:5749464682942997643211811601429638934605262215694978619119774917438987844929876700242783901713395263471616.0000 Gradient: 8.68984862617e+55
69 w: 4100979870711996096052934134486676500812460952911872.0000 Val:178184471477316325956704252197550522775658135424472460908960549283597853425358111022071140535628239598518272.0000 Gradient: -4.83763873019e+56
70 w: -22830154940253691489575133080326112223088257880031232.0000 Val:5522202087760068191513424223737343270163395517866576057199025752453613069348218525882316154152771847238713344.0000 Gradient: 2.6931134811e+57
71 w: 127095472552392323791930060068912737799978728885846016.0000 Val:171141265258593349035392589537980293384794749098481212981641704921866118110869349193995357272757129753221136384.0000 Gradient: -1.49925627493e+58
72 w: -707540495699168089582537355364659457009650324245315584.0000 Val:5303922639707778617075724204967883216611540945237744852830209731305441358668540025326465417558511578778924220416.0000 Gradient: 8.34635968252e+58
73 w: 3938877939557269224550863583400842624227212566487629824.0000 Val:164376460145354611854398465299558677870439663945612346922228994110624186626952566962495960856595023603287845765120.0000 Gradient: -4.64641843526e+59
74 w: -21927733489515321169773244173600233180478223741546921984.0000 Val:5094271256453697328391748877173120640624009714176978253844658788831285182463897445027492899899837097377765861097472.0000 Gradient: 2.58666114291e+60
75 w: 122071692336131809323792887614623852265601384326423904256.0000 Val:157879051607400989706347611497782054680611940937654657592597109437942824276470779748277829259697195781945628328722432.0000 Gradient: -1.43999425826e+61
76 w: -679573111235245909841938966101151521265088004372320747520.0000 Val:4892906891221201016990965838568141939208584155066516616262655449693460102842182370751354187665718179860400028229042176.0000 Gradient: 8.01644803571e+61
77 w: 3783183510246613811092239516670232024432379885427345063936.0000 Val:151638470097306122207304637683665599499147729498586119373783139938492462492320221140541413368456117399996161313598341120.0000 Gradient: -4.46275662148e+62
78 w: -21060982601542906526436614249082327274869067534483637403648.0000 Val:4699501977997490528902710137445908300374516376470979198722481130583112264147712845065521813909939335698417339990880026624.0000 Gradient: 2.48441661118e+63
79 w: 117246490142789381517320509948569887987396261299660959776768.0000 Val:145644563856587588218911670543288619981143857476657862673558667928872664698033977620812254207392923781982516875018471735296.0000 Gradient: -1.38307472744e+64
80 w: -652711210624908318179485106800993877566590535797709408829440.0000 Val:4513741898671232477861429693902969943607140692557907004575308665107202647491610836963139835954120324002712989078594105901056.0000 Gradient: 7.69957700768e+64
81 w: 3633643309548865403887812081074991532069066211785321272573952.0000 Val:139887582401508562724478571950973213579571372822735327500017158159640603203648184566911284285270218670222376958077395470909440.0000 Gradient: -4.28635452017e+65
82 w: -20228492304258531165083428377794829396866894935081783467704320.0000 Val:4335324471232947726560938750741169430550584332659314373265981632473881705971297302746250507706740520117596344798924072101609472.0000 Gradient: 2.38621356138e+66
83 w: 112612016657807248262563431864175776357433231372237341817044992.0000 Val:134358160661646677011593636928268962129054286957018640946289678204179316319332260775257273414685559401897583345172618485717532672.0000 Gradient: -1.32840508962e+67
84 w: -626911096734013025956813575313345746823745900247558117079908352.0000 Val:4163959458205656465472757176285676282405610892773777574000914821041598286685366076415131278846105745675114191272974044592349380608.0000 Gradient: 7.39523113392e+67
85 w: 3490014075518251137187840153402224603460229143367045523926155264.0000 Val:129047303745426601282332292239891565830381273358798415880388509513704951094012927735683679121297803042093472517810095963004662185984.0000 Gradient: -4.11692517225e+68
86 w: -19428908358410104701132151180958473144927307129186484275640270848.0000 Val:3999368094506048275282933612558822684449103784979248352111180122672302388514898844345886370715981992865533025214556516464054975332352.0000 Gradient: 2.29189224339e+69
87 w: 108160732831269073381916663926287374068323261128497055813846695936.0000 Val:123946372307835175283963278820832358612742579973763828094360699697112326755087573664740021815793666883927877059382315448692451116580864.0000 Gradient: -1.2758964119e+70
88 w: -602130799671675144148080280076045575056054536928927073417998368768.0000 Val:3841282633968180094669342408102177752622811241712052422657526139698079014786443218684198128585709105932515593761429391676000456994193408.0000 Gradient: 7.10291532503e+70
89 w: 3352062161772216553656968561252844315545138789825161571013096374272.0000 Val:119047068496515945935141280146987297125771891248848551967407109830554157875683196172982394817735600968542182994408966598279971145529688064.0000 Gradient: -3.95419296144e+71
90 w: -18660930054585927055297768432820833089942572228670776665644703154176.0000 Val:3689445913792022757614686918127719822581871521377292591768567782738790340075005733663899586660832766220612744502944966244102147910584500224.0000 Gradient: 2.20129922164e+72
91 w: 103885397613879896911098395028040538897493275867780875498915580870656.0000 Val:114341422453380413872944465408080410654915756181243685462010788705110019065883625173894305830393268724753438490009756237972538681821512073216.0000 Gradient: -1.22546327668e+73
92 w: -578330008516469397406239379007032010435154391989466363090664064811008.0000 Val:3543610936208294382789047698371914293016478241540182868555657852895647262846857732843098105197730279581410938907427491169796937299251576176640.0000 Gradient: 6.8221540613e+73
93 w: 3219563157411185013144015238877510748014329084968201849731281854660608.0000 Val:109821779349779089654991920298255159641752333466148533965661526503980104240964830841014400900110451436775964323213904740080217360224901383847936.0000 Gradient: -3.79789316593e+74
94 w: -17923308097308069604448187009376410925142243471337352817590021021237248.0000 Val:3403540466679104805953356841184471378267948311891390815230364501419621143212133535479561411790865649221799757100635257893837796542313899877203968.0000 Gradient: 2.11428712547e+75
95 w: 99779056177714014574968991405188879408120559498736831208936912405397504.0000 Val:105480786934140386033478970537703575620024855230536759560352313960630236283492669169280732976687762286945253749054082764227156335771503556558848000.0000 Gradient: -1.17702364275e+76
96 w: -555470005741334045772023907982814635531938702213816400123802935636262912.0000 Val:3269006647980754226449727187203446287380807326222425523192218715006623334802030860987130752290206516621098032131484963837429647726675339616345653248.0000 Gradient: 6.55249061919e+76
97 w: 3092301521962006479710422581074568570574001354799827219242697680056483840.0000 Val:101311383571822482637146407462517456958658480360308023987024609183706559926468926700953994676420292399276782952529364812051683396507715976362635296768.0000 Gradient: -3.6477715277e+77
98 w: -17214842572762491983140763217163080724903871231004103530678536393614950400.0000 Val:3139790629540916628792804788340453115486480709183088724080844979945841944433085082782739343021219029666531026861852722114455201549929151283586739994624.0000 Gradient: 2.03071440947e+78
99 w: 95835028602568814526145615913995505629013361209448979908413695504173498368.0000 Val:97306786757720413350244994521053520161229209720885471794054290255590425376397129301694628556554346599460569028134768974400207568365486952192759136518144.0000 Gradient: -1.13049871175e+79
100 w: -533513604230500639516325604104736468449575934797707846573944485913699549184.0000 Val:3015682211427238894586465512416463388065431434687251601517583483905876246618467601831557397498651971679086764649619349376300138897060752420627978276306944.0000 Gradient: 6.29348632833e+79
101 w: 2970070234751197951736698318492110168983632586050115993726897441197315325952.0000 Val:93460482082942970218400005298030157159899030362016113674147404485296638835367593099653129041775267955527056022623921752861765235077306737732327251564822528.0000 Gradient: -3.50358383898e+80
102 w: -16534380996859917385961075658343078511029394511440196057938586039449938821120.0000 Val:2896479502408225620227444040505946089483947017592525153745507812159181727507071137169502780042147241203703255239520425314915782213981495260871191699065405440.0000 Gradient: 1.95044512316e+81
103 w: 92046899009519191358659649469946680117482956204211816536614367494766384381952.0000 Val:89766212637609993381548878865235177214039423313847104601943853562474978975865096452985709647083765604093684119911935143721732055084434476921259591869325639680.0000 Gradient: -1.08581280006e+82
104 w: -512425086785993455947234117065425182294324170039408288924946576644324691279872.0000 Val:2781988591530152547633528331184297389953275611082308267822428231446452023511230624760472191438673126435817475971033927529118856255050699432985624450550447734784.0000 Gradient: 6.04471985796e+82
105 w: 2852670458137626315803101179367888240015083666097343173800155631813719809327104.0000 Val:86217968832532265702149678358164992394911722738649665825066674159013212517786522934511710088188889317618670258396181006026565896425665790470090896299711279398912.0000 Gradient: -3.36509554492e+83
106 w: -15880816440452166645006856307243603539655023852722436934881537242390214706462720.0000 Val:2672023232675768473036199391573364947602149209570429599202466678555342576832238050102604541291063656717775411475917184915407749044575619969154146766922114207318016.0000 Gradient: 1.87334868986e+84
107 w: 88408505123997217455564074113434228026101964913259068806233688889580048357523456.0000 Val:82809978623215518216088352856899944429012999799609350644034596704844402570813607846946257974241071711335892983396205627555934190286457148358865766141357743409201152.0000 Gradient: -1.04289321564e+85
108 w: -492170148025292587493056998863236855543410371127774347986811655529565818489667584.0000 Val:2566404541591619244142455854767840518476342780012358164149360855917390198425133365353293652125049828380787248898554517399587860566748473440737142280042045925620711424.0000 Gradient: 5.80578653149e+85
109 w: 2739911214056804866844931247869183587802672891405897385052305602515481426288181248.0000 Val:79536698120286721410880625107864874593731926134000059655018542030858674826456821769501092341790257956178028862743653008289121627869759386672051959118380984151840915456.0000 Gradient: -3.23208136208e+86
110 w: -15253085728654233783076077276443284236561558346343582649237052112483691914739056640.0000 Val:2464960704891189489815966509742676566778168881650618669843876777627198133999241714161856297503583055750468225706084680801471552358867355939293227813729979195722263691264.0000 Gradient: 1.79929969427e+87
111 w: 84913928251418150423773289420288077597004892098508191777949109239591492756434845696.0000 Val:76392802571067525258437658368728351079967703855907550684741779361960645765493659362600090481499539037296449086828247490067623058525407028322512316965424114392929191067648.0000 Gradient: -1.0016701398e+88
112 w: -472715838575644878403156676164274181460311817263875718008856151594468920647558889472.0000 Val:2367526700560412944430601570536511041110302480685629826718245369596519212005128721305380712298151401452723119917208992152713147634319272437472405336400630866558098627100672.0000 Gradient: 5.57629766827e+88
113 w: 2631609073350615371996272635534338864935204754533141645824973013513939232194276884480.0000 Val:73373177697624350148270175647465480555685048731992977035870380383502297722205543074702375042636029002879069085280035912501772511068083539103906408725354775586411586776989696.0000 Gradient: -3.10432491193e+89
114 w: -14650167711342873895177370261591032385123312397996563749380429849801763423837688430592.0000 Val:2273944029510971102913757956981008846197545599852328560549893564476066410969873941782995834998225991741239573532650936202665099963036315232347728999207956944601716623962800128.0000 Gradient: 1.72817767847e+90
115 w: 81557483649045802637089770217725222508041953936428634121427964060016721355218712788992.0000 Val:70472911377204929516290958486586607292646953802544311292663048003310592052333682190790078640008403772773174496983810426353325037883107217699403144967725801864223195166098325504.0000 Gradient: -9.62076513604e+90
116 w: -454030511474237930247659822063384171053605511827434629519668133764862797915666438422528.0000 Val:2184060457744622041062764936759903567207648073251969054800975548649920188621000834672761780105339951902481347559593494620146335729272351693953830658661706105281716446687578292224.0000 Gradient: 5.35587995123e+91
117 w: 2527587857377083593944525721949268980859406070065038115355298018219510134318552120819712.0000 Val:67687285651527407311574595368870239520819102758799710424360110388842347267958343463943194291622725468310048968836983332562024714365905346795358789929450785928248521705279702695936.0000 Gradient: -2.98161836885e+92
118 w: -14071081602018223686811342988219017444478951430842708965644772796095809301402590608621568.0000 Val:2097729768709171189883230090515368910007180689819080060919479710518920161536525876450205963780012171906995665195156073000250706510116332124061846487203640070528749546728717992067072.0000 Gradient: 1.65986694594e+93
119 w: 78333711278435476387277160499262052458321485679059737362737979830291335912026798097956864.0000 Val:65011769051922826902405501034962100790259357065848678047198095587237383269884267396100190485825435864444549972893741028634310142655341331002937787796620987377766735793407877093261312.0000 Gradient: -9.24047928805e+93
120 w: -436083770687050378874193216515926917697149565181367862900796565818897055550079579275132928.0000 Val:2014811525443207502222111085180456347334668203974934091177823949622505315779740900070920831164488729290342632951964278435849538620661383411996024451926519029564161761371739184766648320.0000 Gradient: 5.14417481965e+94
121 w: 2427678351414809827092596852758463793972570424058139932430741424942681844160622675779125248.0000 Val:62442009227846412767874150198567878240166634684335149477542129712040647354454858286570339296736775212323971070976044067735534116892725474269899119172561033149429729778375997277996056576.0000 Gradient: -2.8637621221e+95
122 w: -13514885382326247535001557734291345986247070306899563104524509319485952026004229394953928704.0000 Val:1935170842122701352239030399687437507965002408325833989066935772414904004522110183156640917652861520802803000137201241188593233702139882812019054604588267920457376743656176824109817135104.0000 Gradient: 1.59425637337e+96
123 w: 75237366923410225759062089153951596572201198328561306539752184330786214881915953433408438272.0000 Val:59973825866766431245408931676300548468007717752248303137815048611975444510277751654126732691381506666364644952699626188079706735286546542673292931074652915774794261514476845892518574817280.0000 Gradient: -8.87522523057e+96
124 w: -418846421662624810431535101971670682791587993619942240531046834928384235745866162479042658304.0000 Val:1858678164637807456435467021570459038773665256093877048832010867734824179434992942692259049740922595561985012303268291293795544932730038422295301300299076209667000609907293150855516085288960.0000 Gradient: 4.94083788586e+97
125 w: 2331718029395832439748752451773185345343881814491578896877311828315579949314751542726745915392.0000 Val:57603203893912839631899903665693758446666637534879244633959247342991842562579349757912905758289984787270540849811204587678070746798812456321517892621918053060679728488215519002329260687884288.0000 Gradient: -2.75056445106e+98
126 w: -12980674269646598980876534130156582364935913405427875794551542918104444769864588607926246047744.0000 Val:1785209059842956557867724403701972994230259147648912180983295257316811686965672620732975855018184189478630471245019991753661788959490356994587228578593599969991071669329063444969007757798670336.0000 Gradient: 1.5312392299e+99
127 w: 72263413659122642375039330127150630820196355387436030449433656305361835690031558034789167529984.0000 Val:55326286940823331341443509504415843501587507143875562931741582707344932315551356371877086197181369581252128704079350270008109434801257252347632789881563436021115141661273498258149695790497398784.0000 Gradient: -8.52440879288e+99
128 w: -402290423840335762833415746639237880988510054117109225794738518608460636552845960417735725809664.0000 Val:1714644013137371629354531466116573169492744781561440363817096614951025041491920552135771353914927150136622569545716333991928934169163802477331625272661888409640142911144554440981699072045237665792.0000 Gradient: 4.74553837499e+100
129 w: 2239550789519148808768260017022080488150399479658966285972643883942629960691147486804751761276928.0000 Val:53139371072062706404535361642287550311428333804042887146256324290542630121322012975830401791484191042958932195382683763118444580092125989083976366982683940710777566147002229829097825591335631978496.0000 Gradient: -2.64184121336e+101
130 w: -12467579245253102855020343590868698399566993573290461461434101931823320666885340007074543261712384.0000 Val:1646868234046748759716470163873547870688611184203686841621614051591176618088399924487277818614859107266124716660466076890105785088396622936092732619184681581831921075603323276108427914236294202392576.0000 Gradient: 1.47071300348e+102
131 w: 69407013658324020279598882928228654443221532108555333033650571407499739016890258313655675449769984.0000 Val:51038898759909259277642941389258449477539736718195619384890173882799703913024526956385220935607071312000215135883857973661963000800699397126588503392520324684694049345266549023607479599870211428188160.0000 Gradient: -8.18745929036e+102
132 w: -386388845035889973289795315937586427897265881865150520220262972607839585305843741378511205740576768.0000 Val:1581771469489841216497897109553288765619691843091517846808424568680547819706168537582619676850737841411081106948839576988726555475200560654373247345963074822043511867861838190947722781930239056144236544.0000 Gradient: 4.55795858694e+103
133 w: 2151026700314799732212119635353863960045831067518406732099562274835738013077028653594528468393000960.0000 Val:49021453097208296749022468487966996440332023321314544352321424970894720296160996653768764028632884534810718510244605481042022886120733817549553126582380198572689165922600271203402970538282916127814189056.0000 Gradient: -2.53741554535e+104
134 w: -11974765640652492249803155631240180684957230134039865534200298964586788498140617495615899244032950272.0000 Val:1519247824426147013357874434608188020230135840820436753263973481702077706367992428150761105550075687228038885962484150247437930218419270207963107518005701891509006186167793483586142150490179945617917214720.0000 Gradient: 1.4125792341e+105
135 w: 66663520321512444706053482355851832825439836039383997414747110246068276203135622022855002901295661056.0000 Val:47083752238976884042609275390344756179145111686568328195871050195633497235692800862579657440589317415964274885392949930155298915101371489122668740717190279910011740255413357240728890195074116244928651591680.0000 Gradient: -7.86382859622e+105
136 w: -371115817629859896914782044313325125186910720216969691663865085858377922127231310724629604715435393024.0000 Val:1459195589592978358745788772738711638306010867476578992430392298918429580185197068035800272522768490157459860217901423003948069795810007665368734963902762495851874055005240676491753919057559348112265314304000.0000 Gradient: 4.37779337951e+106
137 w: 2066001756745429723237480401638303552533607370124439838271581027795584344261059708920209574063534768128.0000 Val:45222644063719335985098344181396794029446832723239334864733502505609458385167822484854216384521896960838853284357649958082324438214828162911301652272861029028932564009584357194538764402902732559615157275197440.0000 Gradient: -2.43711757438e+107
138 w: -11501431779801807992371039135750213570880381034960258085721134841081599202935941802212715355035226079232.0000 Val:1401517076051672428693459135291694694848668346507267839569611947060256376750882201929791079625695828721534675207513953800871515002245075834796178419401118199801261864942773185261764573447514236925450692029579264.0000 Gradient: 1.35674335365e+108
139 w: 64028470718156674928593898108662074213930420573152690589830395828649722865359995052040399301341138124800.0000 Val:43435101045767582844291523756656965807372176155870635260329102838073008519551388371571406200467671212738625345377618975506422285800457528599163550191357728985441260326155036982568333368778427902955804323462774784.0000 Gradient: -7.5529902498e+108
140 w: -356446496487978198585300324319530296592694091359410882343732956355382310477437191562463767439845204426752.0000 Val:1346118456273794820892561775458194819846864291274113520768739326201347052665320721256807969157631489172602449182450393387103530278196467551855866216730025126854121108198289721979312747017883335304310200424154005504.0000 Gradient: 4.20474967206e+109
141 w: 1984337645948575353208018896238973911924302053382847002023098211490390618113026418260135650117250690580480.0000 Val:41718215330306296177518506054115829239307942778237625350702964738126472113126533761090271761721700244047802699084318870148448227963739115913854787253887744705957355796570899266592616697392828473849637928852340604928.0000 Gradient: -2.34078414244e+110
142 w: -11046807674995719811216021669993017490731979115957019083478616193906345452081748935592648318945758833278976.0000 Val:1292909611508819737693167575845120448779937475755167991895778910736312805664018776764760465545600790410171078422290877368313020898160091071377335166244478298020981599073868039600093677964582135944044991859921598283776.0000 Gradient: 1.30311453209e+111
143 w: 61497578326701175973852436666326276657677016803157348471353731782786178813688051224884728151850639513616384.0000 Val:40069194003069868450428510167320741189045438747047779303755303311787899898376651025399815211405985420383021836706671052913410462744639029639874568476266369157153121145399187022957786434889990547869559697394611731300352.0000 Gradient: -7.25443860017e+111
144 w: -342357018544745500163445685181056840597893078434360006084203863226247834168483266607238789003734455022518272.0000 Val:1241803985185005932676487252661754936970851431128278065851035953168006158111990268409459736895433703449583327091669260896553198503832082009725750084602588994515723689820817802904416693802621374352784846670209208830394368.0000 Gradient: 4.03854596871e+112
145 w: 1905901522238598215249893881379907238431904440715406734206981135726841977762574479382167325343495938392457216.0000 Val:38485354547017280348928524077634966049992507993488975951512981567473465923239631810301088437668719252640034453474473779848848423642748995754600903805458808521731908606754611419335653548961641654248829133895761433863389184.0000 Gradient: -2.24825854078e+113
146 w: -10610153774302276277593930091153468767510825475399089924674249656442561134356976980105926625042303906996027392.0000 Val:1192718442104985992968931107992143799924993952025488486112487153298836148917855206145361193259847028693973429385978192882211388853593836569733215282042347675166135899120466887783853407485651411677458522580794758866724192256.0000 Gradient: 1.25160552965e+114
147 w: 59066726061540781401082014529331594878002725377718064129744165863796874839134995451710971870531823284892205056.0000 Val:36964120478593815864310859583345021777940645890341249693229989084153399802943183767209074548779447840656123667137422227456230079644417742012081519717636861863516972311303193258004856734705303255798231529689684508755953188864.0000 Gradient: -6.96768798358e+114
148 w: -328824463984597584402337500482005264618561376405463191164248873351567712639941049937944800752125278584620187648.0000 Val:1145573133207015276232627733896267826904834443891551280265565337200120390863564610640296194674767915668225989532672504401622199332651583698049874304513795250678180565769514925521721490971598082284419298615654513571369035235328.0000 Gradient: 3.87891190046e+115
149 w: 1830565791002254941803163113419036240540018748361366749038730804453819990944804418737650635957640206042030669824.0000 Val:35503017156480765599188687085914665978294337694141602769014588477106831586473167578480633133078712837781168012277626428195058568052013136411635287868546578683226366200697564267099695083008857877123557104942331650318454929489920.0000 Gradient: -2.15939025499e+116
150 w: -10190759758509552696583634810303285478201798516823582408133944355648921431083981332090938552841917683468253265920.0000 Val:1100291365671885093562678189731915537580111542159044523240533683530251396949278765808240543448634232807293208807280762353142815899026090600477214741003981593383218466720743537140621729851846923247493914515615488843368559907176448.0000 Gradient: 1.20213255495e+117
151 w: 56731959575622692111766745995377712792598897225538260512592103943301181253128144936180382275644510125555334512640.0000 Val:34099667756015197304395357708433571387841958707930861526784518821523041045681020772009026646784909654380936778812821993578496113336449968553814504860631002086425272732662702006879594840129148604340112018243014566929382671466364928.0000 Gradient: -6.69227193341e+117
152 w: -315826818957491584914173889131349161669386732172824844298946888964780796999932786998315151299484704491698344951808.0000 Val:1056799478164199983818261870157748604511951184875841467345289355156691291832027661042967793042319975197840840391672130102789952437050142514418299029861630702413227269258510794072755732609693297977414241110455437092259504413929897984.0000 Gradient: 3.72558778533e+118
153 w: 1758207901136355926760249808969649574849535716903637912692989281949315027396562289364367409656182748045901387792384.0000 Val:32751789402731562033854677292542548185823373268830373475024744290355512108975240132078010573581453906738202911270095692254646935002770670039390695907721079179825690775779224473311225819163784149311504375532468807499153546041725288448.0000 Gradient: -2.07403472009e+119
154 w: -9787943385626093888286207817485459830805317440879689426655265803933961351948095366111892207029281772744707961520128.0000 Val:1015026721005072206589306519613417886591835963803789677992767771232365666484485969099972538026574665644958826580628597881518683077525880483131855947038901068769140812821032354137433233507003688744338209346308674332935001580468109836288.0000 Gradient: 1.15461512868e+120
155 w: 54489480827780472661741715102718090762098833399812835942611456921618968236783530905883303993453507317916398532952064.0000 Val:31457189458734765182227677427687854725614384793170893101402020314134362883105484460032915489689642073402110892224721148635137098427409506354715376744621950404389403438205601684790380738877577952300842810392563419688669260000305698832384.0000 Gradient: -6.42774242134e+120
156 w: -303342939768253906373699120282217746350399545922934022245149020675100058274881853504808611816857034402399353967738880.0000 Val:974905141081294833130845761673602566691963136499665758436607501614769740367785577816920960304153780235941468105448516553117902392830143770467527238740056781600789233801173457377051031383815838975501436783029422432127302752763738324992000.0000 Gradient: 3.57832420596e+121
157 w: 1688710145689870006324311848061564095750793101474046974513095218981051214163787747723920235471058184667786922829021184.0000 Val:30213761955864389847990276750577100463442739450901776511040630642562957414133401142502690119826789906429286072686462613642380180939844440889132012141057012902895253055164698305405023874798370337381727967594964254946059103587821272417959936.0000 Gradient: -1.99205308546e+122
158 w: -9401049381055506038520117059399788369692053283145151907543777438032371642051201468213704439806779905293250660623450112.0000 Val:936369471303790542464211824335768115687254751958862630168507998666691080040475447451203257780033962177790715921110987714865912233632689903797587295191498607214346372776954916883488824098958354273269927754419772142629468011984723165960994816.0000 Gradient: 1.10897595267e+123
159 w: 52335641904336010170563584541062563018638101551394684794733417053325940431659845639341261380263103322402992232389935104.0000 Val:29019484169847238026719653296204066706137265932993488228622674807348499515826528374471366038450609197644253656073558751137771335168044834497629280570029495465602902040252749647753695881697393438355049563800272810008426580933325691076143480832.0000 Gradient: -6.17366912854e+123
160 w: -291352518481438653129576086993003761078170441745153712338208834393661761229439118240007685473757487307087033881994461184.0000 Val:899357024435494880798682300254066179240693571806075056282463561097371130005037032600387714563920064863786926386282835585161064442552900153902865736558393189452917675224083674423708111002291188537653403455791163185528449581826670484696818778112.0000 Gradient: 3.43688160386e+124
161 w: 1621959470386169456511548090985785637148585717792696810776083371169181283094778370642634115715467751575829772063005999104.0000 Val:27872413329865390186047747393774028202609840098405675573664653900314116537560093102572983519770514990172814908625852309136282800731329025252672496080066226047330879288899078320296641119168558143643245367470207763647786637041894169292754988826624.0000 Gradient: -1.91331198887e+125
162 w: -9029448371639807548670532626062224512361726854276993386719037686440567422574802100488218236958728058706563859330705129472.0000 Val:863807591115977077728832898635564178366323382420044347676222804968080687740965971703626382543090974636713964645932678403738664608884703500653258514035566602249024299950116938723334335856657933746490554276355214400596030851825526273935922836799488.0000 Gradient: 1.0651407842e+126
163 w: 50266939084918818511868137970481726170372023492707867428254833599458703836088204310135891180651489096704432739200933036032.0000 Val:26770683458187315968685679330384717480065433330153647355634091596838092755475526315831992704888129417151511976912724217477999701079981786132257046131371935784048283369473075806995210381863236287510889283883459710039205905399154153574097831786971136.0000 Gradient: -5.92963874566e+126
164 w: -279836049885743023996357252953026369878464043571927223906195201227639982534705556688554506855186914973120845834853021646848.0000 Val:829663341916894424694891860364210430143051356962687855531402187333644058310637271063155729007136705063826500144538275558266045937633018018603072031860148835150442731015120142839405703068022791714252397788240944460881249472820135708295402863978348544.0000 Gradient: 3.30102988971e+127
165 w: 1557847289713931677918057920705548478221421245569333049984194481579637285825706017134119880638297279022613942654817598963712.0000 Val:25712502334720669504547636188875660445342307527225563235104989442563637790846300099885104836812111760813829502772793020554361006210874650561981712693818460936816582217602744441965651746227063094760127546631390334015566743534580400876645372346815741952.0000 Gradient: -1.8376833396e+128
166 w: -8672535861837457487452551118939649886453565522626886558230113211212069772196571504455592001908788076158849056219848624832512.0000 Val:796868733268970150954824485464656797832343360005220863712230938041723758001238294197408634836179422717728995144949872091893979203922308998748438509346083905171049646600970580288538952171482327493818214883896875324327779243387958270696611490454657040384.0000 Gradient: 1.02303831516e+129
167 w: 48280007142849126351676262872154174281462211154907360519692883555556283848129618273922475224954689225453791339009780664500224.0000 Val:24696148581549223077215510505225497389854389065341064716400057389507047948733818644161853114630872138329748027187187316238990407258794576115666585664388079223093169679436368922922008623896218172775873486603024250917094781267227865521033037397705629892608.0000 Gradient: -5.69525430047e+129
168 w: -268774799764241123487966222573656623055577713996579612329909095113969230392076607367571344187219294096835838876534807239589888.0000 Val:765370417107448287307174070563576126733807329098741274516139038707428856755298417549058621477537527663606323452793709188226284822703565976250158171817492074093384665024079868281049411655989984900341112483591556864293944984133047952836376494967658742022144.0000 Gradient: 3.17054806907e+130
169 w: 1496269310287530567207472084285514637309486198029415462880247259509599172168054915943312898048404991263510650949786817179680768.0000 Val:23719968862710896911964994468008291860622198966858535282942370233729785605614597699252832220973825564316367014036046688216645145905106972063630953364960844298473134430903181398731612889613273812080021842099604603657592188141410027051736812521997650569986048.0000 Gradient: -1.76504411005e+131
170 w: -8329731250370683471183750533787855908920358972320683776279866507378132434734923880972547592248061866807793897195766341397643264.0000 Val:735117154089047518286882508831139225537248620294197846549403590849883651223512581457775964306858987497378283938484835213125846973488761062456885166345542291261755054953026541493371987091463858731867152567839750414390922678786390356357819112549483521104674816.0000 Gradient: 9.82600056066e+131
171 w: 46371613870813603512888425101930942432434972829773890907873777981219000068275567243521738127929217843817042979148140256629358592.0000 Val:22782375194662027431667970284557793887261983618907959509293880969405832772247293392608002787258479242920771038027712681917210864157983048349078759053485319522671067817859178860531328692463051712238151803480791041247855911726389096325802102297594945798416629760.0000 Gradient: -5.47013451212e+132
172 w: -258150774418819340048812847336655347307877392361205483034481987858756105322973732381152125354057309893311998337873875768421908480.0000 Val:706059730239241504961136215745211221199112146923562917353031907847517496548114932278910574669060460320756951618892777562320536399544506402015333539109396470342787342060411486916610108463181628057575685958965816904804302788967255794943951538572280286077977100288.0000 Gradient: 3.0452238829e+133
173 w: 1437125361189567285675787175968066957390990993151044830879784495879790746783804959733567342259686860407782533410329585945168314368.0000 Val:21881842363052418164850885546159856378119630906236493955770602712671171030502860634185624454859719663664979565927396111324154336532195829881587869844507160705740336664691356990546570717815376097343033099566085108703090514423485274515154238180404916049356580192256.0000 Gradient: -1.69527613561e+134
174 w: -8000476885742321164625511242999156422118257075588905823563133553554425767202295905962276665490359090653810930707192558889845391360.0000 Val:678150876894273163769642068867422390032819441704080967194225405393467345698869726666063489418209405402084350330063760945912900628799905611660424805979592784678949816152193586612000356650643706673940186430166902074984781702118578008777573873577182853508213243904000.0000 Gradient: 9.43760224693e+134
175 w: 44538654822927501627955174616817709709839536388726335181677919964146638236402980776649080910555749472062768539610073669290059366400.0000 Val:21016905441609221355963891050339978519995848195934773584277515068839165618803046364269131067160576663295801086614896506106232295692766662735446346950201254817174964451159762518532259189554853557211259651773981086060784022559470828383332631058948888407181727230853120.0000 Gradient: -5.25391317087e+135
176 w: -247946691399237499329054331895625735422044947557381261810504045282793879908758403268302578956517686882621577393206213868383461441536.0000 Val:651345193807672236934993211339164586375744162345290001146611296042884699434434064638180240624290882913909931464614043198619106444371459862013113429310103094713109393746576278289774777402524681021446852929696110966009776330189992034262780400874365921225171199051431936.0000 Gradient: 2.92485346222e+136
177 w: 1380319231019555830293237070814057683966204649799551564473072406631921091294822519554476608677894413208894103351596258872651568644096.0000 Val:20186157409093355498337666853022077200119360151018085180644769880563623300495083599369590828278859978601558970833595005414778358287755527367072710141718254003178437675585250932502237143766647550596010733590451768306284541053451936444365400870474219284459066369597505536.0000 Gradient: -1.62826592242e+137
178 w: -7684237159085868723665769521210597849646794805396732624428061382411781534643528262184977522291565257453956803961283828860714278191104.0000 Val:625599075296185839371391194829006121644489829542663205217514835641681537204090263503862733036772048343760589581854912953593068421703607516423740826943068582618296185453519074995681370176574124495792266878898770784653507108063537313461653513860962090515299958004933197824.0000 Gradient: 9.06455639011e+137
179 w: 42778148264631036431103267986897094302362454095966075894168461056901542534011685197710115184993731119278557898088754479951726552023040.0000 Val:19388246860451921892974067093129066369125012852593249530748762871183621808327772268602127052199022087710348892959018675013735379212327556296404607195547528385156745468278875711864466131126909666265132003324756109551734811493075924345606617787322520073724025458738715426816.0000 Gradient: -5.04623854237e+138
180 w: -238145951389200966396356823134663841784484208655345516149286195895695605783808113218068862475489832166694357355444018808050563575447552.0000 Val:600870639304980578979786347380818671964741381048898109674228804204881038408029807428525864078435723066521995080744772969054090522605983966812356728976518688082818286158125748940390828988985961560389701758342763093374791945403673320727221975588381859045020637798920911585280.0000 Gradient: 2.80924099654e+139
181 w: 1325758511383681730462294207288330936618021950901612270774649658666524519736141467553208348140937761047843688625116519260397349321572352.0000 Val:18621875808443269960703026513897361683601571461135665395620502297056930867687031487307122712400267789284595572030457686567871322903919563949106166596318618027883514331565217272958520483046820883594258781386059381785672949952224271985352847168465945055510061134298882940862464.0000 Gradient: -1.56390446277e+140
182 w: -7380497632872958195695591073610886028610856192504607569798103764755723262443598127573360795408841801714611614162894712336054012042280960.0000 Val:577119659276735652468693517376842566710892337487643220831525753205803597386106523383570771182189258411907071747939073342195057300844364545577410813328428410829448877053709526067208592388732244041911139174287279460679113740160232370122302911388899187085587491106383925300166656.0000 Gradient: 8.70625614426e+140
183 w: 41087230322203751961836403584300300812554260876309117604935380190943622531061384034317463716864301440662354086107837002791557211671756800.0000 Val:17885797572158709216802879731047382381023450030684589946573342137749554718891008705072375855699573362188475441950844233965491067791919897682718603273668810596769740248749907321571942468411426252952891532177752731545974836737990154861803742780294488328435888615901298770309021696.0000 Gradient: -4.84677279551e+141
184 w: -228732611203708321773691625118374876012806475138771973724732885434578218561254102563723455537455574157415059151209269835643103432193081344.0000 Val:554307498713783222041877540117196866589367503242085494180586601530124923277144176001544785919475197903058584681102180063738222795838634693864102640874360292971972287463861315564628695829535131491596024375361106112677298840372389279801421419196471767251383238314373454040494243840.0000 Gradient: 2.69819841526e+142
185 w: 1273354446571044635185674020617378488095038703317275151851277362175809241358808273280621669781115555476072475599112756138802976246430957568.0000 Val:17178814749005728599100839009488709768707640986775380832152957272644140415373317528008219447908008769183675586130590673005001354463207759441067039732986589663313326384830324708048708756940353708706791841690576030390943114971339822311933630517003836545160524993419673495881939681280.0000 Gradient: -1.50208705777e+143
186 w: -7088764204061006194052757753503941317582121288563367546062690578870759558357291782065040623469855713588507483914476639737677883638967959552.0000 Val:532397048326849254259444943144156633493102497383274907244930817316071105359244612121258852170770876167084624871522428937828044948869615197397279363659111440708979707950999467651941773667701026485767236244140990418863353738190643083704044048343230693430403909212492109300039466090496.0000 Gradient: 8.36211865063e+143
187 w: 39463150324007627984438211434641660606018639023768214881119460089243591345180593669811443751215084294108295047991433214326814256347886911488.0000 Val:16499777266854021839446897569376292751882718468175305318754363116572123716677552337740287379214958084841460152274778460061184551467070299384072638163375417262677864980293014206853480835031153271280467345885086588187676737360814039569188941745634310512252641182039722328399678940381184.0000 Gradient: -4.65519145281e+144
188 w: -219691357853750524738250206194479492263618964679295316093720899360412796842317077041733342465779055329202474915394614368841482634799903408128.0000 Val:511352665668156565773039457521280299129930728400502991076536819395026076196201266098374633175514741621614667797026276023249499516567638346888950995691687663777703514900662912079066049446756025375150681525578197839224351483002091577106771136794269865899816961731007853110438738310201344.0000 Gradient: 2.59154508178e+145
189 w: 1223021789171828861780184131044459158435818747461644365176825769891236417754216695123076262954596265629901527865035842173284549762597133484032.0000 Val:15847580513175358217289807022239422186127792456453278281161972091436856532886936622988535697124164793802863529561482363722930193085883223884049405612445031826192914254606784393365020653698216175618312778854468761389564730727905126544051834344679534361277513847566719599772744697350455296.0000 Gradient: -1.44271314703e+146
190 w: -6808562300319572372850495812456958225672514258224426910620495118193038058745839312000410642864334045751514225820620068086518497053622541484032.0000 Val:491140117150688269735440566567658924313342950072898271814019029984959202289339372919886720351774679270454636395216689340530903420362762103676318514326564330436871740880039453715335716267285786483688879359709954810544101387391174329388915015536473061385003459223389825698763643452926197760.0000 Gradient: 8.03158408949e+146
191 w: 37903266325879063545124825128345884503404937581649788870394181479597026257807851044618417152744525830160413821547390031445796892930396053831680.0000 Val:15221163538134275583223233316222122148241272931333373271000898078542194654967627863224323190801177861002411537056567907126013231011426946892939393513005324793247060268361897418613369080545031144669011434426421592468912965229505434079575284915185105907353293352723092460380262747289025511424.0000 Gradient: -4.47118286262e+147
192 w: -211007483636168749300934034764468304659077683117140743030401254833310766915742484282718881939024171446479529347963698636648339490169449909059584.0000 Val:471726522359289537254063485075879320964689312731439539087719210469506409036220651828795460449081290085947983525759581220647262982168012183550501271288419140398265078974168612090177583542366968467398439401164421463359091941445420597053531556978550926003358276799321392941978667109249751973888.0000 Gradient: 2.48910749962e+148
193 w: 1174678661402551469251605931140482070979687275450027848060160221413519604886403323350939838380172210562164832109847848125491634061730420747141120.0000 Val:14619507328706176405282432440283534663720739955255605158757138876252106269037752430670174251693391127562212928493856954496581725746913600106510048241287530070531217023083176903412631469505931460695895458733199595491303286046878841280502764501921466598356698786306943545158104037478304451657728.0000 Gradient: -1.38568614504e+149
194 w: -6539436108028003577031946365201139720415105274089775907082114031928984815718125713334581244469738937803746624617754293377670638742695597697925120.0000 Val:453080300563016945883598938676535054675087446625037677833824129947134586279095326960549303364920202748487677953364067784743761814033617986415715206876887787075484072603886484240264394430353208226189732689668862544282842245088373715762361973125502735547801002741657164979178945247518461576347648.0000 Gradient: 7.71411476943e+149
195 w: 36405040813391896370827344945009196653989231444466593541738246252758393372231936647546129575891264468349234466430068537303042254512175744165085184.0000 Val:14041633151015428655258927522138202170220134270300321643323719395020128949118973979049110482148541716501888120570861333436823684606735359744616275856588455021959834291742915844513060736519435238504197548182163684340849590582105116990965151207630392839491810585104475046123480829848860182773235712.0000 Gradient: -4.29444769214e+150
196 w: -202666862208152745122827187869279579029810097197801271764611930354195261071011854419684033105119708275884559222049281969369299231123233604538204160.0000 Val:435171119341730004102248204741640289268794415491481971240738969613752006651209556381944779143094627391097392506992536880004478554396372838644890233671339160768992240079207323759746430449968614268178690016355268194281135026084412774734750723139968752493233045340942187700494518070375330113705213952.0000 Gradient: 2.39071903022e+151
197 w: 1128246421912786178825598263869633978964216711670492553947353089506664742958211777178115116801216974322064171752312475499692746873273590770772738048.0000 Val:13486600958196917050377611802878677054149185532161645557076191688483562374866785663351469796842686757104483565615354862014758890395828305635039201472369558155255020565015381813958083002221393628315621818596914690214718633700390525602219204161941064683849924803945408920535358956255197974983801307136.0000 Gradient: -1.33091328412e+152
198 w: -6280947830788480292683136503768949878127742427586106014844778523032154396815979827956236688936932913958938991338021306739664418698186484277156249600.0000 Val:417969845243349223922448358230562731997891770810016153803489901874444600461948179229689404332458596211026124701651773395707836793680017978697658702614368096453731275876131053119505504906889173379306432795598509872964556473726943572528304875330451997621377113726800308047323562903794551647285867970560.0000 Gradient: 7.4091942527e+152
199 w: 34966036573999470286229430102931766807988963892812451840450405945066333797624500548372508006080017180772826938640577009672216823176324229723097923584.0000 Val:12953507861190959800553404930528759307989935458020755703354106785442377957617473189593795944921245687068103479170007063790343264636214888547195381911737363633413639067466448107774246135865823444765899933805755871539668416626312522275979120578086963578442896590341188494637279147187167603276752732815360.0000 Gradient: -4.12469844048e+153
200 w: -194655925607455054127076395370874999995368007522305907001660607774328657680115803621716771044701259573787352431297022998538290280862217083544630460416.0000 Val:401448496391513134796013690931086108653672504178068772430301003236141000146729350494382718754898466616677643332707066784728765774729677854008938928488068359216215076955102923789952366904595956127067335915063354535272314673719203437000345117890579272613674144855307234432547222977123386618720276688404480.0000 Gradient: 2.29621962181e+154
201 w: 1083649537856702209462239198769232449383221013720197476380670710795802762537616115409492546414021520302570578004533468982226002130911646032808835547136.0000 Val:12441486659984120306643511043239649756571126082074664216939354564405137390588689076568796579241141939426690301558157374279329817607033422163745544997162072461159385840336569621464832287216673153182673531722856132984439948254408387604933309673281491409881942910994222222579090972364697918617248305738940416.0000 Gradient: -1.27830546346e+155
Out[21]:
1.0836495378567022e+150

In [19]:
# Exercise: 
# 1. Consider a set of 100 data points and explain the behavior of the algorithm. 
# 2. How could we fix this behavior?

Stochastic Gradient Descend

The last function evals the whole dataset $(\mathbf{x}_i,y_i)$ at every step.

If the dataset is large, this strategy is too costly. In this case we will use a strategy called SGD (Stochastic Gradient Descend).

When learning from data, the cost function is additive: it is computed by adding sample reconstruction errors.

Then, we can compute the estimate the gradient (and move towards the minimum) by using only one data sample (or a small data sample).

Thus, we will find the minimum by iterating this gradient estimation over the dataset.

A full iteration over the dataset is called epoch. During an epoch, data must be used in a random order.

If we apply this method we have some theoretical guarantees to find the minimum.


In [25]:
import numpy as np
x = range(10)
y = [2*i for i in x]
data = zip(x,y)

def in_random_order(data):
    import random
    indexes = [i for i,_ in enumerate(data)]
    random.shuffle(indexes)
    for i in indexes:
        yield data[i]
        
for (x_i,y_i) in in_random_order(data):
    print x_i,y_i


 5 10
0 0
9 18
3 6
1 2
7 14
2 4
6 12
8 16
4 8

In [26]:
def gradient_f_SGD(x,y,w):
    import numpy as np
    return 2*w*(np.array(x)**2) - 2*np.array(x)*np.array(y)

def SGD(target_f, gradient_f, x, y, alpha_0=0.01):
    import numpy as np
    import random
    data = zip(x,y)
    w = random.random()
    alpha = alpha_0
    min_w, min_val = float('inf'), float('inf')
    iteration_no_increase = 0
    while iteration_no_increase < 100:
        val = sum(target_f(x_i, y_i, w) for x_i,y_i in data)
        if val < min_val:
            min_w, min_val = w, val
            iteration_no_increase = 0
            alpha = alpha_0
        else:
            iteration_no_increase += 1
            alpha *= 0.9
        for x_i, y_i in in_random_order(data):
            gradient_i = gradient_f(x_i, y_i, w)
            w = np.array(w) - (alpha *  np.array(gradient_i))
    return min_w

In [27]:
print "w:", SGD(target_f, gradient_f_SGD, x, y, 0.01)


w: 2.0

Exercise: Gradient Descent and Linear Regression

The linear regression model assumes a linear relationship between data:

$$ y_i = w_1 x_i + w_0 $$

Let's generate a more realistic dataset (with noise), where $w_1 = 2$ and $w_0 = 0$:


In [28]:
import numpy as np
x = np.random.uniform(0,1,20)

def f(x): return x*2

noise_variance =0.2
noise = np.random.randn(x.shape[0])*noise_variance
y = f(x) + noise

plt.plot(x, y, 'o', label='y')
plt.plot([0, 1], [f(0), f(1)], 'b-', label='f(x)')
plt.xlabel('$x$', fontsize=15)
plt.ylabel('$t$', fontsize=15)
plt.ylim([0,2])
plt.title('inputs (x) vs targets (y)')
plt.grid()
plt.legend(loc=2)
plt.gcf().set_size_inches((10,6))
plt.show()



In [29]:
# Our model y = x * w
def nn(x, w): return x * w

# Our cost function
def cost(y, t): return ((t - y)**2).sum()

ws = np.linspace(0, 4, num=100)  
cost_ws = np.vectorize(lambda w: cost(nn(x, w) , y))(ws)  

# Ploting the cost function
plt.plot(ws, cost_ws, 'r-')
plt.xlabel('$w$', fontsize=15)
plt.ylabel('Cost', fontsize=15)
plt.title('Cost vs. $w$')
plt.grid()
plt.gcf().set_size_inches((10,6))
plt.show()


Complete the following code and look at the plot of the first gradient descent updates. Explore the behavior of the proposed learning rates.


In [30]:
def gradient(w, x, y): 
    return 2 * x * (nn(x, w) - y)

def step(w_k, x, y, learning_rate):
    return learning_rate * gradient(w_k, x, y).sum()

w = 0.01

# define a learning_rate 
learning_rate = 0.1

nb_of_iterations = 20  
w_cost = [(w, cost(nn(x, w), y))] 
for i in range(nb_of_iterations):
    # Here your code 
    w_cost.append((w, cost(nn(x, w), y)))  
    
for i in range(0, len(w_cost)):
    print('w({}): {:.4f} \t cost: {:.4f}'.format(i, w_cost[i][0], w_cost[i][1]))


w(0): 0.0100 	 cost: 35.8301
w(1): 0.0100 	 cost: 35.8301
w(2): 0.0100 	 cost: 35.8301
w(3): 0.0100 	 cost: 35.8301
w(4): 0.0100 	 cost: 35.8301
w(5): 0.0100 	 cost: 35.8301
w(6): 0.0100 	 cost: 35.8301
w(7): 0.0100 	 cost: 35.8301
w(8): 0.0100 	 cost: 35.8301
w(9): 0.0100 	 cost: 35.8301
w(10): 0.0100 	 cost: 35.8301
w(11): 0.0100 	 cost: 35.8301
w(12): 0.0100 	 cost: 35.8301
w(13): 0.0100 	 cost: 35.8301
w(14): 0.0100 	 cost: 35.8301
w(15): 0.0100 	 cost: 35.8301
w(16): 0.0100 	 cost: 35.8301
w(17): 0.0100 	 cost: 35.8301
w(18): 0.0100 	 cost: 35.8301
w(19): 0.0100 	 cost: 35.8301
w(20): 0.0100 	 cost: 35.8301

In [31]:
# Plotting the first gradient descent updates
plt.plot(ws, cost_ws, 'r-')  # Plot the error curve
# Plot the updates
for i in range(1, len(w_cost)-2):
    w1, c1 = w_cost[i-1]
    w2, c2 = w_cost[i]
    plt.plot(w1, c1, 'bo')
    plt.plot([w1, w2],[c1, c2], 'b-')
    plt.text(w1, c1+0.5, '$w({})$'.format(i)) 
# Plot the last weight, axis, and show figure
w1, c1 = w_cost[len(w_cost)-3]
plt.plot(w1, c1, 'bo')
plt.text(w1, c1+0.5, '$w({})$'.format(nb_of_iterations))  
plt.xlabel('$w$', fontsize=15)
plt.ylabel('$\\xi$', fontsize=15)
plt.title('Gradient descent updates plotted on cost function')
plt.grid()
plt.gcf().set_size_inches((10,6))
plt.show()



In [32]:
w = 0
nb_of_iterations = 10  
for i in range(nb_of_iterations):
    dw = step(w, x, y, learning_rate)  
    w = w - dw  
    

plt.plot(x, y, 'o', label='t')
plt.plot([0, 1], [f(0), f(1)], 'b-', label='f(x)')
plt.plot([0, 1], [0*w, 1*w], 'r-', label='fitted line')
plt.xlabel('input x')
plt.ylabel('target t')
plt.ylim([0,2])
plt.title('input vs. target')
plt.grid()
plt.legend(loc=2)
plt.gcf().set_size_inches((10,6))
plt.show()


Mini-batch Gradient Descent

In code, general batch gradient descent looks something like this:

nb_epochs = 100
for i in range(nb_epochs):
    grad = evaluate_gradient(target_f, data, w)
    w = w - learning_rate * grad

For a pre-defined number of epochs, we first compute the gradient vector of the target function for the whole dataset w.r.t. our parameter vector.

Stochastic gradient descent (SGD) in contrast performs a parameter update for each training example and label:

nb_epochs = 100
for i in range(nb_epochs):
    np.random.shuffle(data)
    for example in data:
        grad = evaluate_gradient(target_f, example, w)
        w = w - learning_rate * grad

Mini-batch gradient descent finally takes the best of both worlds and performs an update for every mini-batch of $n$ training examples:

nb_epochs = 100
for i in range(nb_epochs):
  np.random.shuffle(data)
  for batch in get_batches(data, batch_size=50):
    grad = evaluate_gradient(target_f, batch, w)
    w = w - learning_rate * grad

Minibatch SGD has the advantage that it works with a slightly less noisy estimate of the gradient. However, as the minibatch size increases, the number of updates done per computation done decreases (eventually it becomes very inefficient, like batch gradient descent).

There is an optimal trade-off (in terms of computational efficiency) that may vary depending on the data distribution and the particulars of the class of function considered, as well as how computations are implemented.

Loss Funtions

Loss functions $L(y, f(\mathbf{x})) = \sum_i \ell(y_i, f(\mathbf{x_i}))$ represent the price paid for inaccuracy of predictions in classification/regression problems.

In classification this function is often the zero-one loss, that is, $ \ell(y_i, f(\mathbf{x_i}))$ is zero when $y_i = f(\mathbf{x}_i)$ and one otherwise.

This function is discontinuous with flat regions and is thus extremely hard to optimize using gradient-based methods. For this reason it is usual to consider a proxy to the loss called a surrogate loss function. For computational reasons this is usually convex function. Here we have some examples:

Square / Euclidean Loss (Linear Regression)

In regression problems, the most common loss function is the square loss function:

$$ L(y, f(\mathbf{x})) = \sum_i (y_i - f(\mathbf{x}_i))^2 $$

The square loss function can be re-written and utilized for classification:

$$ L(y, f(\mathbf{x})) = \sum_i (1 - y_i f(\mathbf{x}_i))^2 $$

Hinge / Margin Loss (Suport Vector Machines)

The hinge loss function is defined as:

$$ L(y, f(\mathbf{x})) = \sum_i \mbox{max}(0, 1 - y_i f(\mathbf{x}_i)) $$

The hinge loss provides a relatively tight, convex upper bound on the 0–1 Loss.

Logistic Loss (Logistic Regression)

This function displays a similar convergence rate to the hinge loss function, and since it is continuous, gradient descent methods can be utilized.

$$ L(y, f(\mathbf{x})) = log(1 + exp(-y_i f(\mathbf{x}_i))) $$

Sigmoid Cross-Entropy Loss (Softmax classifier)

Cross-Entropy is a loss function that is very used for training multiclass problems. We'll focus on models that assume that classes are mutually exclusive. In this case, our labels have this form $\mathbf{y}_i =(1.0,0.0,0.0)$. If our model predicts a different distribution, say $ f(\mathbf{x}_i)=(0.4,0.1,0.5)$, then we'd like to nudge the parameters so that $f(\mathbf{x}_i)$ gets closer to $\mathbf{y}_i$.

C.Shannon showeed that if you want to send a series of messages composed of symbols from an alphabet with distribution $y$ ($y_j$ is the probability of the $j$-th symbol), then to use the smallest number of bits on average, you should assign $\log(\frac{1}{y_j})$ bits to the $j$-th symbol.

The optimal number of bits is known as entropy:

$$ H(\mathbf{y}) = \sum_j y_j \log\frac{1}{y_j} = - \sum_j y_j \log y_j$$

Cross entropy is the number of bits we'll need if we encode symbols by using a wrong distribution $\hat y$:

$$ H(y, \hat y) = - \sum_j y_j \log \hat y_j $$

In our case, the real distribution is $\mathbf{y}$ and the "wrong" one is $f(\mathbf{x}_i)$. So, minimizing cross entropy with respect our model parameters will result in the model that best approximates our labels if considered as a probabilistic distribution.

Cross entropy is used in combination with Softmax classifier. In order to classify $\mathbf{x}_i$ we could take the index corresponding to the max value of $f(\mathbf{x}_i)$, but Softmax gives a slightly more intuitive output (normalized class probabilities) and also has a probabilistic interpretation:

$$ P(\mathbf{y}_i = j \mid \mathbf{x_i}) = - log \left( \frac{e^{f_j(\mathbf{x_i})}}{\sum_k e^{f_k(\mathbf{x_i})} } \right) $$

where $f_k$ is a linear classifier.

Advanced gradient descend

See "An Interactive Tutorial on Numerical Optimization": http://www.benfrederickson.com/numerical-optimization/

Momentum

SGD has trouble navigating ravines, i.e. areas where the surface curves much more steeply in one dimension than in another, which are common around local optima. In these scenarios, SGD oscillates across the slopes of the ravine while only making hesitant progress along the bottom towards the local optimum.

Momentum is a method that helps accelerate SGD in the relevant direction and dampens oscillations. It does this by adding a fraction of the update vector of the past time step to the current update vector:

$$ v_t = m v_{t-1} + \alpha \nabla_w f $$$$ w = w - v_t $$

The momentum $m$ is commonly set to $0.9$.

Nesterov

However, a ball that rolls down a hill, blindly following the slope, is highly unsatisfactory. We'd like to have a smarter ball, a ball that has a notion of where it is going so that it knows to slow down before the hill slopes up again.

Nesterov accelerated gradient (NAG) is a way to give our momentum term this kind of prescience. We know that we will use our momentum term $m v_{t-1}$ to move the parameters $w$. Computing $w - m v_{t-1}$ thus gives us an approximation of the next position of the parameters (the gradient is missing for the full update), a rough idea where our parameters are going to be. We can now effectively look ahead by calculating the gradient not w.r.t. to our current parameters $w$ but w.r.t. the approximate future position of our parameters:

$$ w_{new} = w - m v_{t-1} $$$$ v_t = m v_{t-1} + \alpha \nabla_{w_{new}} f $$$$ w = w - v_t $$

Adagrad

All previous approaches manipulated the learning rate globally and equally for all parameters. Tuning the learning rates is an expensive process, so much work has gone into devising methods that can adaptively tune the learning rates, and even do so per parameter.

Adagrad is an algorithm for gradient-based optimization that does just this: It adapts the learning rate to the parameters, performing larger updates for infrequent and smaller updates for frequent parameters.

$$ c = c + (\nabla_w f)^2 $$$$ w = w - \frac{\alpha}{\sqrt{c}} $$

RMProp

RMSProp update adjusts the Adagrad method in a very simple way in an attempt to reduce its aggressive, monotonically decreasing learning rate. In particular, it uses a moving average of squared gradients instead, giving:

$$ c = \beta c + (1 - \beta)(\nabla_w f)^2 $$$$ w = w - \frac{\alpha}{\sqrt{c}} $$

where $\beta$ is a decay rate that controls the size of the moving average.

(Image credit: Alec Radford)

(Image credit: Alec Radford)


In [ ]: